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enormous number of potential structures due to their degrees of 
conformational freedom. Thus a small peptide can have the 
•desired' amino-acid sequence and yet have very low activity in 
an assay because the 'active' peptide conformation is only one of 
5 the many alternative structures adopted in free solution. This 
presents another difficulty encountered in producing small 
heterologous peptides recombinants for effective research and 

therapeutic use. 

Inclusion body formation is also frequently observed when 

10 the genes for heterologous proteins are expressed in bacterial 
cells. These inclusion bodies usually require further 
manipulations in order to solubilize and refold the heterologous 
protein, with conditions determined empirically and with 
uncertainty in each case. 

15 If these additional procedures are not successful, little to 

no protein retaining bioactivity can be recovered from the host 
cells. Moreover, these additional processes are often 
technically difficult and prohibitively expensive for practical 
production of recombinant proteins for therapeutic, diagnostic or 

20 other research uses. 

To overcome these problems, the art has employed certain 
peptides or proteins as fusion "partners" with a desired 
heterologous peptide or protein to enable the recombinant 
expression and/or secretion of small peptides or larger proteins 

25 as fusion proteins in bacterial expression systems. Among such 
fusion partners are included lacZ and trpE fusion proteins, 
maltose-binding protein fusions, and glutathione-S -transferase 
fusion proteins [See, generally. Current Protocols in Molecular 
Biology, Vol. 2, suppl. 10, publ. John Wiley and Sons, New York, 

30 NY, pp. 16.4.1-16.8.1 (1990); and Smith et al, Gene 67 :31-40 
(1988)]. U. S. Patent 4,801,536 describes the fusion of a 
bacterial flagellin protein to a desired protein to enable the 
production of a heterologous gene in a bacterial cell and its 
secretion into the culture medium as a fusion protein. PCT 

35 Patent Publication W091/11454 discloses fusion proteins using 
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PEPTIDE AND PROTEIN FUSIONS TO THIOREDOXIN 
AND THIOREDOXIN-LIKE MOLECULES 

The invention relates generally to the production of fusion 
proteins in prokaryotic and eukaryotic cells. More specifically, 
the invention relates to the expression in host cells of 
recombinant fusion sequences comprising thioredoxin or 
thioredoxin-like sequences fused to sequences for selected 
heterologous peptides or proteins, and the use of such fusion 
molecules to increase the production, activity, stability or 
solubility of recombinant proteins and peptides. 

Background of the Invention 

Many peptides and proteins can be produced via recombinant 
means in a variety of expression systems, e.g., various strains 
of bacterial, fungal, mammalian or insect cells. However, when 
bacteria are used as host cells for heterologous gene expression, 
several problems frequently occur. 

For example, heterologous genes encoding small peptides are 
often poorly expressed in bacteria. Because of their size, most 
small peptides are unable to adopt stable, soluble conformations 
and are subject to intracellular degradation by proteases and 
peptidases present in the host cell. Those small peptides which 
do manage to accumulate when directly expressed in E. coli or 
other bacterial hosts are usually found in the insoluble or 
"inclusion body" fraction, an occurrence which renders them 
almost useless for screening purposes in biological or 
biochemical assays . 

Moreover, even if small peptides are not produced in 
inclusion bodies, the production of small peptides by recombinant 
means as candidates for new drugs or enzyme inhibitors encounters 
further problems. Even small linear peptides can adopt an 
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biotinylated renin as the fusion partner. The renin is 
immobilized on a purification column to facilitate separation and 
cleavage. 

However, often fusions of desired peptides or proteins to 
5 other proteins (i.e., as fusion partners) at the amino- or 
carboxyl- termini of these fusion partner proteins have other 
potential disadvantages. Experience in E. coli has shown that a 
crucial factor in obtaining high levels of gene expression is the 
efficiency of translational initiation. Translational initiation 

10 in E. coli is very sensitive to the nucleotide sequence 
. surrounding the initiating methionine codon of the desired 
heterologous peptide or protein sequence, although the rules 
governing this phenomenon are not clear. For this reason, 
fusions of sequences at the amino-terminus of many fusion partner 

15 proteins affects expression levels in an unpredictable manner. 
In addition there are numerous amino- and carboxy-peptidases in 
E. coli which degrade amino- or carboxyl -terminal peptide 
extensions to fusion partner proteins so that a number of the 
known fusion partners have a low success rate for producing 

20 stable fusion proteins. 

The purification of proteins produced by recombinant 
expression systems is often a serious challenge. There is a 
continuing requirement for new and easier methods to produce 
homogeneous preparations of recombinant proteins, and yet a 

25 number of the fusion partners currently used in the art possess 
no inherent properties that would facilitate the purification 
process. Therefore, in the art of recombinant expression 
systems, there remains a need for new compositions and processes 
for the production and purification of stable, soluble peptides 

30 and proteins for use in research, diagnostic and therapeutic 
applications. 

suTrnnary o-f the Invention 

In one aspect, the invention provides a fusion sequence 
35 comprising a thioredoxin-like protein sequence fused to a 
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selected heterologous peptide or protein. The peptide or protein 
may be fused to the amino terminus of the thioredoxin-like 
sequence, the carboxyl terminus of the thioredoxin-like sequence, 
or within the thioredoxin-like sequence (e.g., within the active- 
5 site loop of thioredoxin) . The fusion sequence according to this 
invention may optionally contain a linker peptide between the 
thioredoxin-like sequence and the selected peptide or protein. 
This linker provides, where needed, a selected cleavage site or a 
stretch of amino acids capable of preventing steric hindrance 
10 between the thioredoxin-like molecule and the selected peptide or 
protein. 

As another aspect, the invention provides a DNA molecule 
encoding the fusion sequence defined above in association with, 
and under the control of, an expression control sequence capable 
15 of directing the expression of the fusion protein in a desired 
host cell. 

Still a further aspect of the invention is a host cell 
transformed with, or having integrated into its genome, a DNA 
sequence comprising a thioredoxin-like DNA sequence fused to the 

20 DNA sequence of a selected heterologous peptide or protein. This 
fusion sequence is desirably under the control of an expression 
control sequence capable of directing the expression of a fusion 
protein in the cell. 

As yet another aspect, there is provided a novel method for 

25 increasing the expression of soluble recombinant proteins. The 
method includes culturing under suitable conditions the above- 
described host cell to produce the fusion protein. 

In one embodiment of this method, if the resulting fusion 
protein is cytoplasmic, the cell can be lysed by conventional 

30 means to obtain the soluble fusion protein. More preferably in 
the case of cytoplasmic fusion proteins, the method includes 
releasing the fusion protein from the host cell by applying 
osmotic shock or freeze/thaw treatments to the cell. In this 
case the fusion protein is selectively released from the interior 

35 of the cell via the zones of adhesion that exist between the 
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inner and outer membranes of E. coli . The fusion protein is then 
purified by conventional means. 

In another embodiment of the method , if a secretory leader 
is employed in the fusion protein construct , the fusion protein 
5 can be recovered from a periplasmic extract or from the cell 
culture medium* 

An additional step in both of these methods is cleavage of 
the desired protein from the thioredoxin-like protein by 
conventional means. 
10 Other aspects and advantages of the present invention will 

be apparent upon consideration of the following detailed 
description of preferred embodiments thereof. 

Summary of the Drawings 

Fig. 1 illustrates the DNA sequence of the expression 
plasmid pALTRXA/EK/ILIOa Pro-581 and the amino acid sequence for 
the fusion protein therein , described in Example 1. 

Fig. 2 illustrates the DNA sequence and amino acid sequence 
of the macrophage inhibitory protein-:i* (MIP-3* ) protein used in 
the construction of a thioredoxin fusion protein described in 
Example 3. 

Fig. 3 illustrates the DNA sequence and amino acid sequence 
of the bone morphogenetic protein-2 (BMP-2) protein used in the 
construction of a thioredoxin fusion protein described in Example 
4. 

Fig. 4 is a schematic drawing illustrating the insertion of 
an enterokinase cleavage site into the active-site loop of E . 
coli thioredoxin (trxA) described in Example 5. 

Fig. 5 is a schematic drawing illustrating random peptide 
insertions into the active-site loop of E. coli thioredoxin 
(trxA) described in Example 5. 

Fig. 6 illustrates the DNA sequence and amino acid sequence 
of the human interleukin-6 (IL-6) protein used in the 
construction of a thioredoxin fusion protein described in Example 
6. 
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Fig. 7 illustrates the DNA sequence and amino acid sequence 
of the M-CSF protein used in the construction of a thioredoxin 
fusion protein described in Example 7. 

5 Detailed Description of the Invention 

This invention permits the production of large amounts of 
heterologous peptides or proteins in a stable, soluble form in 
certain host cells that normally express limited amounts of such 
peptides or proteins. It enables release of the fusion protein 

10 from the production cells without the necessity of lysing the 
cells , thereby streamlining the purification process. Also, by 
using a small peptide insert in an internal region of the 
thioredoxin-like sequence (e.g. the active site loop of 
thioredoxin) the invention provides a ready cleavage site, 

15 accessible on the surface of the molecule. The fusion proteins 
of this invention also permit the desired peptide or protein to 
achieve its desired conformation. 

According to the present invention, the DNA sequence 
encoding a heterologous peptide or protein selected for 

20 expression in a recombinant system is fused to a thioredoxin-like 
DNA sequence for expression in the host cell. A thioredoxin-like 
DNA sequence is defined herein as a DNA sequence encoding a 
protein or fragment of a protein characterized by an amino acid 
sequence having at least 18% homology with the amino acid 

25 sequence of E. coli thioredoxin over an amino acid sequence 
length of 80 amino acids. Alternatively, a thioredoxin DNA 
sequence is defined as a DNA sequence encoding a protein or 
fragment of a protein characterized by a crystalline structure 
substantially similar to that of human or E. coli thioredoxin. 

30 The DNA sequence of glutaredoxin is one such sequence. The amino 
acid sequence of E. coli thioredoxin is described in H. Eklund et 
al, EMBO J. 3 x1443-1449 (1984). The three-dimensional structure 
of E. coli thioredoxin is depicted in Fig. 2 of A. Holmgren, 
Biol. Chem. 264 ;13963-13966 (1989). Fig. 1 below nucleotides 

35 2242-2568 contains a DNA sequence encoding the E. coli 
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thioredoxin protein [Lim et al, J. Bacteriol , 163 :311*316 
(1985)]. The three latter publications are incorporated herein 
by reference for the purpose of providing information on 
thioredoxin which is known to one of skill in the art. 
5 As the primary example of a thioredoxin-like protein useful 

in this invention , E. coli thioredoxin has the following 
characteristics. E. coli thioredoxin is a small protein, only 
11.7 kD, and can be expressed to high levels (>10%, corresponding 
to a concentration of 15 uM if cells are lysed at 10 A 550 /ml) • 

10 The small size and capacity for high expression of the protein 
contributes to a high intracellular concentration. E, coli 
thioredoxin is further characterized by a very stable, tight 
structure which can minimize the effects on overall structural 
stability caused by fusion to the desired peptide or proteins, 

15 The three dimensional structure of E. coli thioredoxin is 

known. It contains several surface loops, including a unique 
active site loop between residues Cys 33 and Cys 36 which protrudes 
from the body of the protein. This active site loop is an 
identifiable, accessible surface loop region and is not involved 

20 in any interactions with the rest of the protein that contribute 
to overall structural stability. It is therefore a good 
candidate as a site for peptide insertions. Both the amino- and 
carboxyl -termini of E. coli thioredoxin are on the surface of the 
protein, and are readily accessible for fusions. 

25 E. coli thioredoxin is also stable to proteases. Thus, E . 

coli thioredoxin may be desirable for use in E. coli expression 
systems, because as an E. coli protein it is characterized by 
stability to E. coli proteases. E. coli thioredoxin is also 
stable to heat up to 80 *C and to low pH. Other thioredoxin-like 

3 0 proteins encoded by thioredoxin-like DNA sequences useful in this 
invention may share the homologous amino acid sequences, and 
similar physical and structural characteristics. Thus, DNA 
sequences encoding other thioredoxin-like proteins may be used in 
place of E. coli thioredoxin according to this invention. For 
5 example, the DNA sequence encoding other species 1 thioredoxin, 
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e.g., human thioredoxin, may be employed in the compositions and 
methods of this invention. Both the primary sequence and 
computer-predicted secondary structures of human and E. coli 
thioredoxins are very similar. Human thioredoxin also carries 
5 the same active site loop as is found in the E. coli protein. 

Insertions into the human thioredoxin active site loop and on the 
amino and carboxyl termini may be as well tolerated as those in 
E. coli thioredoxin. 

Other thioredoxin-like sequences which may be employed in 

10 this invention include all or portions of the proteins 

glutaredoxin and various species' homologs thereof [A. Holmgren, 
* cited above] . Although E. coli glutaredoxin and E. coli 
thioredoxin share less than 20% amino acid homology, the two 
proteins do have conformational and functional similarities 

15 [Eklund et al, EMBO J. 3 ; 1443-1449 (1984)]. 

All or a portion of the DNA sequence encoding protein 
disulfide isomerase (PDI) and various species 1 homologs thereof 
[J. E. Edman et al, Nature 317 :267-270 (1985)] may also be 
employed as a thioredoxin-like DNA sequence, since a repeated 

20 domain of PDI shares >18% homology with E. coli thioredoxin. The 
two latter publications are incorporated herein by reference for 
the purpose of providing information on glutaredoxin and PDI 
which is known and available to one of skill in the art. 

Similarly the DNA sequence encoding phosphoinositide- 

25 specific phospholipase C (PI-PLC) , fragments thereof and various 
species' homologs thereof [C. F. Bennett et al, Nature 334:268- 
270 (1988)] may also be employed in the present invention as a 
thioredoxin-like sequence based on the amino acid sequence 
homology with E. coli thioredoxin. All or a portion of the DNA 

30 sequence encoding an endoplasmic reticulum protein, such as 

Erp72, or various species homologs thereof are also included as 
thioredoxin-like DNA sequences for the purposes of this invention 
[R. A. Mazzarella et al, J- Biol. Chem. 265 ; 1094-1101 (1990)] 
based on amino acid sequence homology. Another thioredoxin-like 

35 sequence is a DNA sequence which encodes all or a portion of an 
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adult T-cell leukemia-derived factor (ADF) or other species 
homologs thereof [N. Wakasugi et al, Proc. Natl. Acad. Sci.. USA 
82:8282-8286 (1990)] based on amino acid sequence homology to E. 
coli thioredoxin. The three latter publications are incorporated 
5 herein by reference for the purpose of providing information on 
PI-PLC, Erp72, and ADF which are known and available to one of 
skill in the art. 

It is expected from the definition of thioredoxin-like DNA 
sequence used above that other sequences not specifically 
10 identified above, or perhaps not yet identified or published, may 
be useful as thioredoxin-like sequences based on their amino acid 
sequence similarities to E. coli thioredoxin and characteristic 
crystalline structural similarities to E. coli thioredoxin and 
the other thioredoxin-like proteins. Based on the above 
15 description, one of skill in the art should be able to select and 
t^^l^ify, or, if desired, modify, a thioredoxin-like DNA sequence 
for use in this invention without resort to undue 
experimentation. For example, simple point mutations made to 
portions of native thioredoxin or native thioredoxin-like 
20 sequences which do not effect the structure of the resulting 
^ttolecule are alternative thioredoxin-like sequences, as are- 
allelic variants of native thioredoxin or native thioredoxin-like 
sequences. 

DNA sequences which hybridize to the sequence for E. coli 
25 thioredoxin or its structural homologs under either stringent or 
relaxed hybridization also encode thioredoxin-like proteins for 
use in this invention. Stringent hybridization is defined herein 
as hybridization at 4XSSC at 65 'C, followed by a washing in 
0.1XSSC at 65*C for an hour. Alternatively stringent 
3 0 hybridization is defined as hybridization in 50% formamide, 4XSSC 
at 42 *C. Non-stringent hybridization is defined herein as 
hybridizing in at are 4XSSC at 50 *C, or hybridization with 3 0-40% 
formamide at 42 *C. The use of all such thioredoxin-like 
sequences are believed to be encompassed in this invention. 
3 5 Construction of a fusion sequence of the present invention, 
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which comprises the DNA sequence of a selected peptide or protein 
and the DNA sequence of a thioredoxin-like sequence, employs 
conventional genetic engineering techniques [see, Sambrook et al, 
Molecular Cloning. A Laboratory Manual . , Cold Spring Harbor 
5 Laboratory, Cold Spring Harbor, New York (1989) ] . Fusion 

sequences may be prepared in a number of different ways. For 
example, the selected heterologous protein may be fused to the 
amino terminus of the thioredoxin-like molecule. Alternatively, 
the selected protein sequence may be fused to the carboxyl 
10 terminus of the thioredoxin-like molecule. Small peptide 

sequences could also be fused to either of the above-mentioned 
positions of the thioredoxin-like sequence to produce them in a 
structurally unconstrained manner. 

This fusion of a desired heterologous peptide or protein to 
15 the thioredoxin-like protein increases the stability of the 

peptide or protein. At either the amino or carboxyl terminus, 
the desired heterologous peptide or protein is fused in such a 
manner that the fusion does not destabilize the native structure 
of either protein. Additionally, fusion to the soluble 
20 thioredoxin-like protein improves the solubility of the selected 
heterologous peptide or protein. 

It may be preferred for a variety of reasons that peptides 
be fused within the active site loop of the thioredoxin-like 
molecule. The face of thioredoxin surrounding the active site 
25 loop has evolved, in keeping with the protein's major function as 
a nonspecific protein disulfide oxido-reductase , to be able to 
interact with a wide variety of protein surfaces. The active 
site loop region is found between segments of strong secondary 
structure and offers many advantages for peptide fusions. A 
0 small peptide inserted into the active-site loop of a 

thioredoxin-like protein is present in a region of the protein 
which is not involved in maintaining tertiary structure. 
Therefore the structure of such a fusion protein should be 
stable. Previous work has shown that E. coli thioredoxin can be 
5 cleaved into two fragments at a position close to the active site 
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loop, and yet the tertiary interactions stabilizing the protein 
remain. 

The active site loop of E. coli thioredoxin has the sequence 
NH 2 • • .Cys 3 3-Gly-Pro-Cys 36 . • .COOH. Fusing a selected peptide with 
a thioredoxin-like protein in the active loop portion of the 
protein constrains the peptide at both ends, reducing the degrees 
of conformational freedom of the peptide, and consequently 
reducing the number of alternative structures taken by the 
peptide. The inserted peptide is bound at each end by cysteine 
residues, which may form a disulfide linkage to each other as 
they do in native thioredoxin and further limit the 
conformational freedom of the inserted peptide. 

Moreover, this invention places the peptide on the surface 
of the thioredoxin-like protein. Thus the invention provides a 
distinct advantage for use of the peptides in screening for 
bioactive peptide conformations and other assays by presenting 
peptides inserted in the active site loop in this structural 
context. 

Additionally the fusion, of a peptide into the loop protects 
it from the actions of E. coli amino- and carboxyl -peptidases. 
Further a restriction endonuclease cleavage site RsrII already 
exists in the portion of the E. coli thioredoxin DNA sequence 
encoding the loop region at precisely the correct position for a 
peptide fusion [see Figure 4]. RsrII recognizes the DNA sequence 
CGG(A/T)CCG leaving a three nucleotide long 5 1 -protruding sticky 
end. DNA bearing the complementary sticky ends will therefore 
insert at this site in just one orientation. 

A fusion sequence of a thioredoxin-like sequence and a 
desired protein or peptide sequence according to this invention 
may optionally contain a linker peptide inserted between the 
thioredoxin-like sequence and the selected heterologous peptide 
or protein. This linker sequence may encode, if desired, a 
polypeptide which is selectably cleavable or digestible by 
conventional chemical or enzymatic methods. For example, the 
selected cleavage site may be an enzymatic cleavage site. 
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Examples of enzymatic cleavage sites include sites for cleavage 
by a proteolytic enzyme, such as enterokinase, Factor Xa, 
trypsin, collagenase, and thrombin. Alternatively , the cleavage 
site in the linker may be a site capable of being cleaved upon 
5 exposure to a selected chemical , e.g., cyanogen bromide, 
hydroxy 1 amine, or low pH. 

Cleavage at the selected cleavage site enables separation of 
the heterologous protein or peptide from the thioredoxin fusion 
protein to yield the mature heterologous peptide or protein. The 

10 mature peptide or protein may then be obtained in purified form, 
free from any polypeptide fragment of the thioredoxin-like 
protein to which it was previously linked. The cleavage site, if 
inserted into a linker useful in the fusion sequences of this 
invention, does not limit this invention. Any desired cleavage 

15 site, of which many are known in the art, may be used for this 
purpose. 

The optional linker sequence of a fusion sequence of the 
present invention may serve a purpose other than the provision of 
a cleavage site. The linker may also be a simple amino acid 

20 sequence of a sufficient length to prevent any steric hindrance 
between the thioredoxin-like molecule and the selected 
heterologous peptide or protein. 

Whether or not such a linker sequence is necessary will 
depend upon the structural characteristics of the selected 

25 heterologous peptide or protein and whether or not the resulting 
fusion pjrotein is useful without cleavage. For example, where 
the thioredoxin-like sequence is a human sequence, the fusion 
protein may itself be useful as a therapeutic without cleavage of 
the selected protein or peptide therefrom. Alternatively, where 

30 the mature protein sequence may be naturally cleaved, no linker 
may be needed. 

In one embodiment therefore, the fusion sequence of this 
invention contains a thioredoxin-like sequence fused directly at 
its amino or carboxyl terminal end to the sequence of the 
35 selected peptide or protein. The resulting fusion protein is 
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thus a soluble cytoplasmic fusion protein. In another 
embodiment, the fusion sequence further comprises a linker 
sequence interposed between the thioredoxin-like sequence and the 
selected peptide or protein sequence. This fusion protein is 
5 also produced as a soluble cytoplasmic protein. Similarly, where 
the selected peptide sequence is inserted into the active site 
loop region or elsewhere within the thioredoxin-like sequence, a 
cytoplasmic fusion protein is produced. 

The cytoplasmic fusion protein can be purified by 

10 conventional means. Preferably, as a novel aspect of the present 
invention, several thioredoxin fusion proteins of this invention 
may be purified by exploiting an unusual property of thioredoxin. 
The cytoplasm of E. coli is effectively isolated from the 
external medium by a cell envelope comprising two membranes, 

15 inner and outer, separated from each other by a periplasmic . space 
within which lies a rigid peptidoglycan cell wall. The 
peptidoglycan wall contributes both shape and strength to the 
cell. At certain locations in the cell envelope there are "gaps" 
(called variously Bayer patches, Bayer junctions or adhesion 

2 0 sites) in the peptidoglycan wall where the inner and outer 

membranes appear to meet and perhaps fuse together. See, M. E. 
Bayer, J. Bacte rid. 93 ; 1104-1112 (1967) and J. Gen. Microbiol. 
53:395-404 (1968). Most of the cellular thioredoxin lies loosely 
associated with the inner surface of the membrane at these 

25 adhesion sites and can be quantitatively expelled from the cell 
through . these adhesion sites by a sudden osmotic shock or by a 
simple freeze/ thaw procedure. See C. A. Lunn and V. P. Pigiet, 
J. Biol. Chem. 257 :11424-11430 (1982) and in " Thioredoxin and 
Glutare doxin Systems: Structure and Function: 165-17 6 (1986) ed. 

30 A. Holmgren et al., Raven Press, New York. To a lesser extent 

some EF-Tu (elongation factor-Tu) can be expelled in the same way 
[Jacobson et al, Biochemistry 15 :2297-2302 (1976)], but, with the 
exception of the periplasmic contents, the vast majority of E . 
coli proteins cannot be released by these treatments. 

35 Although there have been reports of the release by osmotic 
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shock of a limited number of heterologous proteins produced in 
the cytoplasm of E. coli [Denefle et al, Gene 85:499-510 (1989); 
Joseph-Liauzun et al, Gene 86 ; 291-295 (1990); Rosenwasser et al, 
J. Biol. Chem. 265 ;13066-13073 (1990) ] , the ability to be so 
5 released is a rare and desirable property not shared by the 
majority of heterologous proteins. Fusion of a heterologous 
protein to thioredoxin as described by the present invention not 
only enhances its expression, solubility and stability as 
described above, but may also provide for its release from the 
10 cell by osmotic shock or freeze/thaw treatments, greatly 

simplifying its purification. The thioredoxin portion of the 
fusion protein in some cases, e.g., with MIP, directs the fusion 
protein towards the adhesion sites, from where it can be released 
to the exterior by these treatments. 
15 In another embodiment the present invention may employ 

another component, that is, a secretory leader sequence, among 
which many are known in the art, e.g. leader sequences of phoA, 
MBP, 0 -lactamase, operatively linked in f rame to the fusion 
protein of this invention to enable the expression and secretion 
20 of the mature fusion protein into the bacterial periplasmic space 
or culture medium. This leader sequence may be fused to the 
amino terminus of the thioredoxin-like molecule when the selected 
peptide or protein sequence is fused to the carboxyl terminus or 
to an internal site within the thioredoxin-like sequence. An 
25 optional linker could also be present when the peptide or protein 
is fused at the carboxyl terminus. It is expected that this 
fusion sequence construct when expressed in an appropriate host 
cell would be expressed as a secreted fusion protein rather than 
a cytoplasmic fusion protein. However stability, solubility and 
3 0 high expression should characterize fusion proteins produced 
using any of these alternative embodiments. 

This invention is not limited to any specific type of 
heterologous peptide or protein. A wide variety of heterologous 
genes or gene fragments are useful in forming the fusion 
35 sequences of the present invention. While the compositions and 
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methods of this invention are most useful for peptides or 
proteins which are not expressed, expressed in inclusion bodies, 
or expressed in very small amounts in bacterial and yeast hosts, 
the heterologous peptides or proteins can include any peptide or 
protein useful for human or veterinary therapy, diagnostic or 
research applications in any expression system . For example, 
hormones, cytokines, growth or inhibitory factors, enzymes, 
modified or wholly synthetic proteins or peptides can be produced 
according to this invention in bacterial, yeast, mammalian or 
other eukaryotic cells and expression systems suitable therefor. 

In the examples below illustrating this invention, the 
proteins expressed by this invention include IL-11, MIP-3* , IL-6, 
M-CSF, a bone inductive factor called BMP-2, and a variety of 
small peptides of random sequence. These proteins include 
examples of proteins which, when expressed without a thioredoxin 
fusion partner, are unstable in E. coli or are found in inclusion 
bodies. 

A variety of DNA molecules incorporating the above-described 
fusion sequences may be constructed for expressing the 
heterologous peptide or protein according to this invention. At 
a minimum a desirable DNA sequence according to this invention 
comprises a fusion sequence described above, in association with, 
and under the control of, an expression control sequence capable 
of directing the expression of the fusion protein in a desired 
host cell. For example, where the host cell is an E. coli 
strain, .the DNA molecule desirably contains a promoter which 
functions in E. coli r a ribosome binding site, and optionally, a 
selectable marker gene and an origin of replication if the DNA 
molecule is extra chromosomal. Numerous bacterial expression 
vectors containing these components are known in the art for 
bacterial expression, and can easily be constructed by standard 
molecular biology techniques. Similarly known yeast and 
mammalian cell vectors and vector components may be utilized 
where the host cell is a yeast cell or a mammalian cell. 

The DNA molecules containing the fusion sequences may be 
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further modified to contain different codons to optimize 
expression in the selected host cell, as is known in the art. 

These DNA molecules may additionally contain multiple copies 
of the thioredoxin-like DNA sequence, with the heterologous 
5 protein fused to only one of the DNA sequences, or with the 

heterologous protein fused to all copies of the thioredoxin-like 
sequence. It may also be possible to integrate a thioredoxin- 
like/heterologous peptide or protein-encoding fusion sequence 
into the chromosome of a selected host to either replace or 

10 duplicate a native thioredoxin-like sequence. 

Host cells suitable for the present invention are preferably 
bacterial cells. For example, the various strains of g. coli 
(e.g., HB101, W3110 and strains used in the following examples) 
are well-known as host cells in the field of biotechnology. E*. 

15 coli strain GI724, used in the following examples, has been 
deposited with a United States microorganism depository as 
described in detail below. Various strains of B. subtilis . 
Pseudomonas . and other bacteria may also be employed in this 
method • 

20 Many strains of yeast and other eukaryotic cells known to 

those skilled in the art may also be useful as host cells for 
expression of the polypeptides of the present invention. 
Similarly known mammalian cells may also be employed in the 
expression of these fusion proteins. 

15 To produce the fusion protein of this invention, the host 

cell is -either transformed with, or has integrated into its 
genome, a DNA molecule comprising a thioredoxin-like DNA sequence 
fused to the DNA sequence of a selected heterologous peptide or 
protein, desirably under the control of an expression control 

0 sequence capable of directing the expression of a fusion protein. 
The host cell is then cultured under known conditions suitable 
for fusion protein production. If the fusion protein accumulates 
in the cytoplasm of the cell it may be released by conventional 
bacterial cell lysis techniques and purified by conventional 

5 procedures including selective precipitations, solubilizations 
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and column chromatographic methods. If a secretory leader is 
incorporated into the fusion molecule substantial purification is 
achieved when the fusion protein is secreted into the periplasmic 
space or the growth medium, 
5 Alternatively, for cytoplasmic thioredoxin fusion proteins, 

a selective release from the cell may be achieved by osmotic 
shock or freeze/thaw procedures. Although final purification is 
still required for most purposes, the initial purity of fusion 
proteins in preparations resulting from these procedures is 

10 superior to that obtained in conventional whole cell lysates, 

reducing the number of subsequent purification steps required to 
attain homogeneity. In a typical osmotic shock procedure, the 
packed cells containing the fusion protein are resuspended on ice 
in a buffer containing EDTA and having a high osmolarity, usually 

15 due to the inclusion of a solute, such as 20% w/v sucrose, in the 
buffer which cannot readily cross the cytoplasmic membrane. 
During a brief incubation on ice the cells plasmolyze as water 
leaves the cytoplasm down the osmotic gradient. The cells are 
then switched into a buffer of low osmolarity, and during the 

20 osmotic re-equilibration both the contents of the periplasm and 
proteins localized at the Bayer patches are released to the 
exterior. A simple centrifugation following this release removes 
the majority of bacterial cell-derived contaminants from the 
fusion protein preparation. Alternatively, in a freeze/thaw 

25 procedure the packed cells containing the fusion protein are 
first resuspended in a buffer containing EDTA and are then 
frozen. Fusion protein release is subsequently achieved by 
allowing the frozen cell suspension to thaw. The majority of 
contaminants can be removed as described above by a 

3 0 centrifugation step. The fusion protein is further purified by 
well-known conventional methods. 

These treatments typically release at least 3 0% of the 
fusion proteins without lysing the cell cultures. The success of 
these procedures in releasing significant amounts of a wide 

3 5 variety of thioredoxin fusion proteins is surprising, since such 
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techniques are not generally successful with a wide range of 
proteins. The ability of these fusion proteins to be 
substantially purified by such treatments, which are 
significantly simpler and less expensive than the purification 
5 methods required by other fusion protein systems, may provide the 
fusion proteins of the invention with a significant advantage 
over other systems which are used to produce proteins in E. coli. 

The resulting fusion protein is stable and soluble, often 
with the heterologous peptide or protein retaining its 

10 bioactivity. The heterologous peptide or protein may optionally 
be separated from the thioredoxin-like protein by cleavage, as 
discussed above. 

In the specific and illustrative embodiments of the 
compositions and methods of this invention, the g. coli 

15 thioredoxin (trxA) gene has been cloned and placed in an E« coli 
expression system. An expression plasmid pALtrxA-781 was 
constructed. This plasmid containing modified IL-11 fused to the 
thioredoxin sequence and called pALtrxA/EK/ILllA Pro-581 is 
described below in Example 1 and in Fig. 1. A modified version 

20 of this plasmid containing a different ribosome binding site was 
employed in the other examples and is specifically described in 
Example 3. Other conventional vectors may be employed in this 
invention. The invention is not limited to the plasmids 
described in these examples. 

25 Plasmid pALtrxA-781 (without the modified IL-11) directs the 

accumulation of >10% of the total cell protein as thioredoxin in 
E . coli host strain GI724. Examples 2 through 6 describe the use 
of this plasmid to form and express thioredoxin fusion proteins 
with BMP-2, IL-6 and MlP-la , which are polypeptides. 

3 0 As an example of the expression of small peptides inserted 

into the active-site loop, a derivative of pALtrxA-781 has been 
constructed in which a 13 amino-acid linker peptide sequence 
containing a cleavage site for the specific protease enterokinase 
[Leipnieks and Light, J. Biol. Chem. 254 :1077-1083 (1979)] has 

35 been fused into the active site loop of thioredoxin. This 
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plasmid (pALtrxA-EK) directs the accumulation of >10% of the 
total cell protein as the fusion protein. The fusion protein is 
all soluble, indicating that it has probably adopted a 'native' 
tertiary structure. It is equally as stable as wild type 
5 thioredoxin to prolonged incubations at 80 # C, suggesting that the 
strong tertiary structure of thioredoxin has not been compromised 
by the insertion into the active site loop. The fusion protein 
is specifically cleaved by enterokinase, whereas thioredoxin is 
not, indicating that the peptide inserted into the active site 

10 loop is present on the surface of the fusion protein* 

As described in more detail in Example 5 below, fusions of 
small peptides were made into the active site loop of 
thioredoxin. The inserted peptides were 14 residues long and 
were of totally random composition to test the ability of the 

15 system to deal with hydrophobic, hydrophilic and neutral 
sequences • 

The methods and compositions of this invention permit the 
production of proteins and peptides useful in research, 
diagnostic and therapeutic fields. The production of fusion 

20 proteins according to this invention has a number of advantages* 
As one example, the production of a selected protein by the* 
present invention as a carboxyl-terminal fusion to E. coli 
thioredoxin, or another thioredoxin-like protein, enables 
avoidance of translation initiation problems often encountered in 

25 the production of eukaryotic proteins in E. coli . Additionally 
the initiator methionine usually remaining on the amino-terminus 
of the heterologous protein is not present and does not have to 
be removed when the heterologous protein is made as a carboxyl 
terminal thioredoxin fusion. 

3 0 The production of fusion proteins according to this 

invention reliably improves solubility of desired heterologous 
proteins and enhances their stability to proteases in the 
expression system. This invention also enables high level 
expression of certain desirable therapeutic proteins, e.g., IL- 

35 11, which are otherwise produced at low levels in bacterial host 
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This invention may also confer heat stability to the fusion 
protein, especially if the heterologous protein itself is heat 
stable. Because thioredoxin, and presumably all thioredoxin-like 
5 proteins, are heat stable up to 80 *C, the present invention may 
enable the use of a simple heat treatment as an initial effective 
purification step for some thioredoxin fusion proteins. 

In addition to providing high levels of the selected 
heterologous proteins or peptides upon cleavage from the fusion 

10 protein for therapeutic or other uses, the fusion proteins or 

fusion peptides of the present invention may themselves be useful 
as therapeutics. Further the thioredoxin-like fusion proteins 
may provide a vehicle for the delivery of bioactive peptides. As 
one example, human thioredoxin would not be antigenic in humans, 

15 and therefore a fusion protein of the present invention with 
human thioredoxin may be useful as a vehicle for delivering to 
humans the biologically active peptide to which it is fused. 
Because human thioredoxin is an intracellular protein, human 
thioredoxin fusion proteins may be produced in an g. coli 

20 intracellular expression system. Thus this invention also 

provides a method for delivering biologically active peptides or 
proteins to a patient in the form of a fusion protein with an 
acceptable thioredoxin-like protein. 

The present invention also provides methods and reagents for 

25 screening libraries of random peptides for their potential enzyme 
inhibitory, hormone/ growth factor agonist and hormone/growth 
factor antagonist activity. Also provided are methods and 
reagents for the mapping of known protein sequences for regions 
of potential interest, including receptor binding sites, 

30 substrate binding sites, phosphorylat ion/modification sites, 
protease cleavage sites, and epitopes. 

Bacterial colonies expressing thioredoxin-like/random 
peptide fusion proteins may be screened using radiolabelled 
proteins such as hormones or growth factors as probes. Positives 

35 arising from this type of screen would identify mimics of 
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receptor binding sites and may lead to the design of compounds 
with therapeutic uses. Bacterial colonies expressing 
thioredoxin-like random peptide fusion proteins may also be 
screened using antibodies raised against native, active hormones 
5 or growth factors. Positives arising from this type of screen 
could be mimics of surface epitopes present on the original 
antigen. Where such surface epitopes are responsible for 
receptor binding, the •positive 1 fusion proteins would have 
biological activity. 

10 Additionally, the thioredoxin-like fusion proteins or fusion 

peptides of this invention may also be employed to develop 
monoclonal and polyclonal antibodies, or recombinant antibodies 
or chimeric antibodies, generated by known methods for 
diagnostic, purification or therapeutic use. Studies of 

15 thioredoxin-like molecules indicate a possible B cell/T cell 
growth factor activity [N. Wakasuki et al, cited above], which 
may enhance immune response. The fusion proteins or peptides of 
the present invention may be employed as antigens to elicit 
desirable antibodies, which themselves may be further manipulated 

20 by known techniques into monoclonal or recombinant antibodies. 
Alternatively, antibodies elicited to thioredoxin-like 
sequences may also be useful in the purification of many 
different thioredoxin fusion proteins. 

The following examples illustrate embodiments of the present 

25 invention, but are not intended to limit the scope of the 
disclosure. 

EXAMPLE 1 - THIOREDOXIN-IL-11 FUSION MOLECULE 

A thioredoxin-like fusion molecule of the present invention 

30 was prepared using E. coli thioredoxin as the thioredoxin-like 
sequence and recombinant IL-11 as the selected heterologous 
protein. The DNA and amino acid sequence of IL-11 has been 
published. See Paul et al, Proc. Natl. Acad. Sci. U.S.A. 
82:7512-7516 (1990) and PCT Patent publication W09 1/0749, 

35 published May 30, 1991. IL-11 DNA can be obtained by cloning 
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based on its published sequence. The E . coli thioredoxin (trxA) 
gene was cloned based on its published sequence and employed to 
construct various related E. coli expression plasmids using 
standard DNA manipulation techniques, described extensively by 
5 Sambrook, Fritsch and Maniatis, Molecula r Cloning. A Laboratory 
Manual , 2nd edition, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, NY (1989) . 

A first expression plasmid pALTRxa-781 was constructed 
containing the E, coll trxA gene without fusion to another 

10 sequence. This plasmid further contained sequences which are 
described in detail below for the related IL-11 fusion plasmid. 
This first plasmid, which directs the accumulation of >10% of the 
total cell protein as thioredoxin in an E. coli host strain 
GI724, was further manipulated as described below for the 

15 construction of a trxA/IL-11 fusion sequence. 

The entire sequence of the related plasmid expression 
vector, pALtrxA/EK/ILlJAPro-581, is illustrated in Fig. 1 and 
contains the following principal features: 

Nucleotides 1-2060 contain DNA sequences originating from 

20 the plasmid pUC-18 [Norrander et al, Gene 26 : 101-106 (1983)] 
including sequences containing the gene for p -lactamase which 
confers resistance to the antibiotic ampicillin in host E. coli 
strains, and a colEl-derived origin of replication. Nucleotides 
2061-2221 contain DNA sequences for the major leftward promoter 

25 (pL) of bacteriophage X [Sanger et al, *J. Mol. Biol. 162 :729-773 

(1982) ]* including three operator sequences, O l 1, O l 2 and O l 3. 
The operators are the binding sites for X cl repressor protein, 
intracellular levels of which control the amount of transcription 
initiation from pL. Nucleotides 2222-2241 contain a strong 

0 ribosome binding sequence derived from that of gene 10 of 
bacteriophage T7 [Dunn and Studier J. Mol. Biol. 166 :477-535 

(1983) ]. 

Nucleotides 2242-2568 contain a DNA sequence encoding the E-_ 
coli thioredoxin protein [Lim et al, J. Bacteriol. 163 :311-316 
5 (1985)]. There is no translation termination codon at the end of 
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the thioredoxin coding sequence in this plasmid. 

Nucleotides 2569-2583 contain DNA sequence encoding the 
amino acid sequence for a short, hydrophilic, flexible spacer 
peptide M — GSGSG — M . Nucleotides 2584-2598 provide DNA sequence 
5 encoding the amino acid sequence for the cleavage recognition 
site of enterokinase (EC 3.4.4.8), " — DDDDK — " [Maroux et al, J ♦ 
Biol. Chem. 246 :5031-5039 (1971) ] . 

Nucleotides 2599-3132 contain DNA sequence encoding the 
amino acid sequence of a modified form of mature human IL-11 
10 [Paul et al, Proc. Natl. Acad. S ci. USA 87 ; 7512-7516 (1990)], 
deleted for the N-terminal prolyl -residue normally found in the 
natural protein. The sequence includes a translation termination 
codon at the 3 f -end of the IL-11 sequence. 

Nucleotides 3133-3159 provide a "Linker" DNA sequence 
15 containing restriction endonuclease sites. Nucleotides 3160-3232 
provide a transcription termination sequence based on that of the 
E* coli asp A gene [Takagi et al, Nucl. Acids Res. 13 ; 2063-2074 
(1985)]. Nucleotides 3233-3632 are DNA sequences derived from 
pUC-18. 

2 0 As described in Example 2 below, when cultured under the 

appropriate conditions in a suitable E. coli host strain, this 
plasmid vector can direct the production of high levels 
(approximately 10% of the total cellular protein) of a 
thioredoxin-IL-11 fusion protein. By contrast, when not fused to 

25 thioredoxin, IL-11 accumulated to only 0.2% of the total cellular 
protein when expressed in an analogous host/vector system. 

EXAMPLE 2 - EXPRESSION OF A FUSION PROTEIN 

A thioredoxin-IL-11 fusion protein was produced according to 

0 the following protocol using the plasmid constructed as described 
in Example 1. pALtrxA/EK/ILllA Pro-581 was transformed into the 
E. coli host strain GI724 (F", lacl**, iac P L8 . ampC::JLcI + ) by the 
procedure of Dagert and Ehrlich, Gene 6 ; 23 (1979). The 
untransformed host strain E. coli GI724 was deposited with the 

5 American Type Culture Collection, 12301 Parklawn Drive, 
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RoOcville, Maryland on January 31, 1991 under ATCC No. 55151 for 
patent purposes pursuant to applicable laws and regulations. 
Transformants were selected on 1.5% w/v agar plates containing 
IMC medium, which is composed of M9 medium [Miller, "Experiments 
5 in Molecular Genetics", Cold Spring Harbor Laboratory, New York 
(1972)] supplemented with 0.5% w/v glucose, 0.2% w/v casamino 
acids and 100 ii g/ml ampicillin. 

GI724 contains a copy of the wild-type X cl repressor gene 
stably integrated into the chromosome at the a©pC locus, where it 

10 has been placed under the transcriptional control of Salmonella 
■fYP^imii-rjiiTn trp promoter/ operator sequences. In GI724, k cl 
protein is made only during growth in tryptophan- free media, such 
as minimal media or a minimal medium supplemented with casamino 
acids such as IMC, described above. Addition of tryptophan to a 

15 culture of GI724 will repress the trp promoter and turn off 
synthesis of kcZ, gradually causing the induction of 
transcription from pL promoters if they are present in the cell. 

GI724 transformed with pALtrxA/EK/ILll* Pro-581 was grown at 
37 'C to an A 550 of 0.5 in IMC medium. Tryptophan was added to a 

20 final concentration of 100 *i g/ml and the culture incubated for a 
further 4 hours. During this time thioredoxin-IL-11 fusion 
protein accumulated to approximately 10% of the total cell 
protein. 

All of the fusion protein was found to be in the soluble 
25 cellular fraction, and was purified as follows. Cells were lysed 
in a french pressure cell at 20,000 psi in 50 mM HEPES pH 8.0, 1 
mM phenylmethylsulfonyl fluoride. The lysate was clarified by 
centrifugation at 15,000 x g for 30 minutes and the supernatant 
loaded onto a QAE-Toyopearl column. The flow-through fractions 
3 0 were discarded and the fusion protein eluted with 50 mM HEPES pH 
8.0, 100 mM NaCl. The eluate was adjusted to 2M NaCl and loaded 
onto a column of phenyl -Toyopearl . The flow-through fractions 
were again discarded and the fusion protein eluted with 50 mM 
HEPES pH 8.0, 0.5 M NaCl. 
3 5 The fusion protein was then dialyzed against 25 mM HEPES pH 
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8.0 and was >80% pure at this stage. By T1165 bioassay [Paul et 
al, cited above] the purified thioredoxin-IL-11 protein exhibited 
an activity of 8xl0 5 U/mg. This value agrees closely on a molar 
basis with the activity of 2xl0 6 U/mg found for COS cell-derived 
IL-11 purified to homogeneity and measured for activity in the 
same assay, one milligram of the fusion protein was then cleaved 
at 37 # C for 20 hours with 1000 units of bovine enterokinase 
[Leipnieks and Light, J. Biol. Chem. 254 ; 1677-1683 (1979)] in 1 
ml 10mM Tris-Cl (pH 8.0)/10mM CaCl 2 . IL-11 was recovered from 
the reaction products by passing them over a QAE-Toyopearl column 
in 25 mM HEPES pH 8.0, where homogeneous IL-11 was found in the 
flow-through fractions. Uncleaved fusion protein, thioredoxin 
and enterokinase remained bound on the column. 

The homogeneous IL-11 prepared in this manner had a 
bioactivity in the T1165 assay of 2.5xl0 6 U/mg. Its physical and 
chemical properties were determined as follows: 

(1) Molecula r Weight 
The molecular weight of the IL-11 was found to be about 21 

20 kD as measured by 10% SDS-PAGE under reducing conditions (tricine 
system) in accordance with the methods of Schagger, et al., Anal 
Biochem. 166 : 368-37Q (1987). The compound ran as a single band. 

(2) Endotoxin Content 

25 The endotoxin content of the IL-11 was found to be less than 

0.1 nanogram per milligram IL-11 in the LAL ( Limulus amebocyte 
lysate, Pyrotel, available from Associates of Cape Cod, Inc., 
Woods Hole, Massachusetts, U.S.A.) assay, conducted in accordance 
with the manufacturer^ instructions. 

30 

(3) Isoelectric Point 

The theoretical isoelectric point of IL-11 is pH 11.70. As 
measured by poly aery 1 amide gel isoelectric focusing using an LKB 
Ampholine PAGplate with a pH range from 3.5 to 9.5, the IL-11 ran 
35 at greater than 9.5. An exact measurement could not be taken 
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because IL-11 is too basic a protein for the reliable gels 
available. 

(4) Fluorescence Absorption Spectrum 

5 Fluorescence absorption spectrum of the IL-11, as measured 

on a 0.1% aqueous solution in a 1 cm quartz cell showed an 
emission maximum at 335-337 ran. 

(5) UV Absorption 

10 UV absorption of the IL-11 on a 0.1% aqueous solution in a 1 

cm quartz cell showed an absorbance maximum at 278-280 ran. 


(6) Amino Acid Composition 
15 The theoretical amino acid composition for IL-11, based on 

its amino acid sequence is as follow: 


Amino Acid 

dumber 

Mole * 

Ala 

20 

11.3 

Asp Acid 

11 

6.22 

Cysteine 

0 


Glu 

3 

1.70 

Phe 

1 

0.57 

Gly 

14 

7.91 

His 

4 

2.26 

lie 

2 

1.13 

Lys 

3 

1.70 

Leu 

41 

23.16 

Met 

2 

1.13 

Asn 

1 

0.57 

Pro 

21 

11.86 

Gin 

7 

3.96 

Arg 

18 

10.17 

Ser 

11 

6.22 

Thr 

9 

5.09 

Val 

5 

2.83 

Trp 

3 

1.70 

Tyr 

1 

0.57 


A sample of homogenous IL-11 was subjected to vapor phase 
40 hydrolysis as follows: 

6 N HC1 and 2 N Phenol reagent were added to hydrolysis 
vessel in which tubes containing 45 n 1 of 1:10 diluted (w/H 2 0) 
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IL-11, concentrated to dryness are inserted. Samples were sealed 
under vacuum and hydrolyzed for 3 6 hours at 110°C. After the 
hydrolysis, samples were dried and resuspended in 500 |il Na-S 
sample dilution buffer. Amino acid analysis was performed on a 
5 Beckman 7300 automated amino acid analyzer. A cation exchange 
column was used for separation of amino acids following post 
column derivatization with ninhydrin. Primary amino acids were 
detected at 570 run and secondary amino acids were detected at 44 0 
nm. Eight point calibration curves were constructed for each of 

10 the amino acids. 

Because certain amino acids are typically not recovered, 
results for only 5 amino acids are given below. Since the 
hydrolysis was done without desalting the protein, 100% recovery 
was achieved for most of the amino acids. 

15 The relative recovery of each individual amino acid residue 

per molecule of recombinant IL-11 was determined by normalizing 
GLX « 10 (the predicted number of glutamine and glutamic acid 
residue in IL-11 based on cDNA sequence) . The value obtained for 
the recovery of GLX in picomoles was divided by 10 to obtain the 

20 GLX quotient. Dividing the value obtained for the recovery in 
picomoles of each amino acid by the GLX quotient for that sample 
gives a number that represents the relative recovery of each 
amino acid in the sample, normalized to the quantitative recovery 
of GLX residues. The correlation coefficient comparing the 

25 expected versus the average number of residues of each amino acid 
observed is greater than 0.985, indicating that the number of 
residues observed for each amino acid is in good agreement with 
that predicted sequence. 

0 Amino 1 No. of Residues 2 No. of Residues 3 Correlation 
30 Acids Calculated Expected Coefficient 

1 Asp 12.78 12 

2 Glu 10.00 10 

3 Gly 12.80 14 0.9852 
35 4 Arg 16.10 18 

5 Pro 18.40 21 
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(7) Amino Terminus Sequencing 

IL-11 (buffered in 95% acetonitrile TFA) was sequenced using 
an ABI 471A protein sequencer (ABI, Inc.) in accordance with the 
manufacturer's instructions. Amino terminus sequencing confirmed 
that the thioredoxin fusion protein produced IL-11 contained the 
correct IL-11 amino acid sequence and only one amino terminus 
observed. 

(8) Peptide Mapping 

The IL-11 was cleaved with Endoproteinase Asp-N (Boehringer 
Mannheim) (1:500 ratio of Asp-N to IL-11) in 10 mM Tris, pH 8, 1 
M urea and 2 mM 4-aminobenzamidine dihydrochloride (PABA) , at 
37°C for 4 hours. The sample was then run on HPLC on a C4 Vydac 
column using an A buffer of 50 mM NaHP0 4 , pH 4.3, in dH 2 °' a B 
buffer of 100% isopropanol with a gradient at 1 ml/min from 100%A 
to 25%A and 75%B (changing 1%/minute) . The eluted peptide 
fragments were then sequenced using an ABI 471A protein sequencer 
(ABI, Inc.) in accordance with the manufacturer's instructions. 
Peptide mapped confirmed the IL-11 produced from the thioredoxin 
fusion protein contained the proper IL-11 N-terminal and C- 
terminal sequences . 

(9) Solubility 

IL-11 protein was tested for solubility in the substances 
below with the following results: 


(10) Sugar Composition and Protein/Polysaccharide Content in % 
The absence of sugar moieties attached to the polypeptide 
backbone of the IL-11 protein is indicated by its amino acid 
sequence, which contains none of the typical sugar attachment 
sites . 


Water 

Ethyl Alcohol 
Acetone 

1M sodium chloride 
10% sucrose 


very soluble 
very soluble 
very soluble 
very soluble 
very soluble 


WO 92/ 1 3953 



PCI7US92/00944 


29 

EXAMPLE 3 - THIOREDOXIN-MIP FUSION MOLECULE 

Human macrophage inflammatory protein la (MIP-2« ) was 
expressed at high levels in E. coli as a thioredoxin fusion 
protein using an expression vector similar to pALtrxA/EK/ 
5 IL11& Pro-581 described in Example 1 above but modified in the 
following manner to replace the ribosome binding site of 
bacteriophage T7 with that of JLCII. In the plasmid of Example 1, 
nucleotides 2222 and 2241 were removed by conventional means. 
Inserted in place of those nucleotides was a sequence of 

10 nucleotides formed by nucleotides 35566 to 35472 and 38137 to 
38361 from bacteriophage lambda as described in Sanger et al 
(1982) cited above. This reference is incorporated by reference 
for the purpose of disclosing this sequence. To express a 
thioredoxin-MIP-1* fusion the DNA sequence in the thusly-modif ied 

15 p ALt r xA/ EK/ 1 LI 1a Pro-581 encoding human IL-11 (nucleotides 2599- 
3132) is replaced by the 213 nucleotide DNA sequence shown in 
Fig. 2 encoding full-length, mature human MIP-3* [Nakao et al, 
MOl. Cell. Biol. lQ;36A6-3figfl (1990) ] . 

The host strain and expression protocol used for the 
2 0 production of thioredoxin-MIP-3* fusion protein are as described 
in Example 1. As was seen with the thioredoxin-IL-11 fusion 
protein, all of the thioredoxin-MIP-l* fusion protein was found 
in the soluble cellular fraction, representing up to 20% of the 
total protein. 

25 Cells were lysed as in Example 1 to give a protein 

concentration in the crude lysate of 10 mg/ml. This lysate was 
then heated at 80 *C for 10 min to precipitate the majority of 
contaminating E. coli proteins and was clarified by 
centrifugation at 13 0,000 x g for 60 minutes. The pellet was 

30 discarded and the supernatant loaded onto a Mono Q column. The 
fusion protein eluted at approximately 0.5 M NaCl from this 
column and was >80% pure at this stage. After dialysis to remove 
salt the fusion protein could be cleaved by an enterokinase 
treatment as described in Example 1 to release MIP-1* . 
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EXAMPLE 4 - THIOREDOXIN- BMP- 2 FUSION MOLECULE 

Human Bone Morphogenetic Protein 2 (BMP-2) was expressed at 
high levels in E, coli as a thioredoxin fusion protein using the 
modified expression vector described in Example 3. The DNA 
5 sequence encoding human IL-11 in the modified 

pALtrxA/EK/ILia*Pro-581 (nucleotides 2599-3132) is replaced by 
the 345 nucleotide DNA sequence shown in Fig. 3 encoding full- 
length, mature human BMP-2 [Wozney et al, Science 242 t 1528-1534 
(1988)]. 

10 In this case the thioredoxin-BMP-2 fusion protein appeared 

in the insoluble cellular fraction when strain GI724 containing 
the expression vector was grown in medium containing tryptophan 
at 37 However, when the temperature of the growth medium was 

lowered to 20 # C the fusion protein was found in the soluble 

15 cellular fraction. 


EXAMPLE 5 - THIOREDOXIN-SMALL PEPTIDE FUSION MOLECULES 

Native Us. coli thioredoxin was expressed at high levels in 
E. coli using strain GI724 containing the same plasmid expression 

20 vector described in Example 3 deleted for nucleotides 2569-3129, 
and employing the growth and induction protocol outlined in 
Example 1. Under these conditions thioredoxin accumulated to 
approximately 10% of the total protein, all of it in the soluble 
cellular fraction. 

25 Fig. 4 illustrates insertion of 13 amino acid residues 

encoding an enterokinase cleavage site into the active site loop 
of thioredoxin, between residues G 34 and P 35 of the thioredoxin 
protein sequence. The fusion protein containing this internal 
enterokinase site was expressed at levels equivalent to native 

3 0 thioredoxin, and was cleaved with an enterokinase treatment as 
outlined in Example 1 above. The fusion protein was found to be 
as stable as native thioredoxin to heat treatments, being 
resistant to a 10 minute incubation at 80 # C as described in 
Example 4. 

35 Below are listed twelve additional peptide insertions which 
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were also made into the active site loop of thioredoxin between 
G34 and P35. The sequences are each 14 amino acid residues in 
length and are random in composition. Each of the thioredoxin 
fusion proteins containing these random insertions were made at 
5 levels comparable to native thioredoxin. All of them were found 
in the soluble cellular fraction. These peptides include the 
following sequences: 

Pro-Leu-Gln-Arg-Ile-Pro-Pro-Gln-Ala-Leu-Arg-Val-Glu-Gly, 
Pro-Arg-Asp-Cys-Val-Gln-Arg-Gly-Lys-Ser-I^u-Ser-Leu-Gly, 

10 Pro-Met-Arg-His-Asp-Val-Arg-Cys-Val-Leu-His-Gly-Thr-Gly, 
Pro-Gly-Val-Arg-Leu-Pro-Ile-Cys-Tyr-Asp-Asp-Ile-Arg-Gly, 
Pro-Lys-Phe-Ser-Asp-Gly-Ala-Gln-Gly-Leu-Gly-Ala-Val-Gly, 
Pro-Pro-Ser-Leu-Val-Gln-Asp-Asp-Ser-Phe-Glu-Asp-Arg-Gly, 
Pro-Trp-Ile-Asn-Gly-Ala-Thr-Pro-Val-Lys-Ser-Ser-Ser-Gly, 

15 Pro-Ala-His-Arg-Phe-Arg-Gly-Gly-Ser-Pro-Ala-Ile-Phe-Gly, 
Pro-Ile-Met-Gly-Ala-Ser-His-Gly-Glu-Arg-Gly-Pro-Glu-Gly, 
Pro-Asp-Ser-Leu-Arg-Arg-Arg-Glu-Gly-Phe-Gly-Leu-Leu-Gly, 
Pro-Ser-Glu-Tyr-Pro-Gly-Leu-Ala-Thr-Gly-His-His-Val-Gly f 
and Pro-Leu-Gly-Val-Leu-Gly-Ser-Ile-Trp-Iieu-Glu-Arg-Gln-Gly . 

2 0 The inserted sequences contained examples that were both 

hydrophobic and hydrophilic, and examples that contained cysteine 
residues. It appears that the active-site loop of thioredoxin 
can tolerate a wide variety of peptide insertions resulting in 
soluble fusion proteins. Standard procedures can be used to 

2 5 purify these loop "inserts". 

EXAMPLE 6 - HUMAN INTERLEUKIN-6 

Human interleukin-6 (IL-6) was expressed at high levels in 
E. coli as a thioredoxin fusion protein using an expression 

30 vector similar to modified pALtrxA/EK/ILia* Pro-581 described in 
Example 3 above. To express a thioredoxin-IL-6 fusion the DNA 
sequence in modified pALtrxA/EK/ILllA Pro-581 encoding human IL- 11 
(nucleotides 2599-3132) is replaced by the 561 nucleotide DNA 
sequence shown in Figure 6 encoding full-length, mature human IL- 

35 6 [Hirano et al, Nature 324 :73-76 (1986)]. The host strain and 
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expression protocol used for the production of thioredoxin-IL-6 
fusion protein are as described in Example 1. 

When the fusion protein was synthesized at 37 *C, 
approximately 50% of it was found in the "inclusion body" or 
5 insoluble fraction. However all of the thioredoxin-IL-6 fusion 
protein, representing up to 10% of the total cellular protein, 
was found in the soluble fraction when the temperature of 
synthesis was lowered to 25 *C. 

10 EXAMPLE 7 - HUMAN MACROPHAGE COLONY STIMULATIN G FACTOR 

Human Macrophage Colony Stimulating Factor (M-CSF) was 
expressed at high levels in E. coli as a thioredoxin fusion 
protein using the modified expression vector similar to 
pALtrxA/EK/ILlOAPro-581 described in Example 3 above. 

15 The DNA sequence encoding human IL-11 in modified 

pALtrxA/EK/ILll*Pro-581 (nucleotides 2599-3135) is replaced by 
the 669 nucleotide DNA sequence shown in Fig. 7 encoding the 
first 223 amino acids of mature human M-CSip [G. G. Wong et al, 
Science 235 :1504-1508 (1987)]. The host strain and expression 

20 protocol used for the production of thioredoxin-M-CSF fusion 
protein was as described in Example 2 above. 

As was seen with the thioredoxin-IL-11 fusion protein, all 
of the thioredoxin-M-CSF fusion protein was found in the soluble 
cellular fraction, representing up to 10% of the total protein. 

25 

EXAMPLE . 8 - RELEASE OF FUSION PROTEIN VIA OSMOTIC SHOCK OR 
FREEZE/THAW 

To determine whether the fusions of heterologous proteins to 
thioredoxin according to this invention enable targeting to the 
3 0 host cell's adhesion sites and permit the release of the fusion 
proteins from the cell, the cells were exposed to simple osmotic 
shock and freeze/thaw procedures. 

Cells overproducing wild-type E. coll thioredoxin, human 
thioredoxin, the E. coli thioredoxin-MIPJ* fusion or the E. coli 
35 thioredoxin-IL-11 fusion were used in the following procedures. 
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For an osmotic shock treatment, cells were resuspended at 2 
A 550 /ml in 20 mM Tris-Cl pH 8.0/2.5 mM EDTA/20% w/v sucrose and 
kept cold on ice for 10 minutes. The cells were then pelleted by 
centrifugation (12,000 xg, 30 seconds) and gently resuspended in 
5 the same buffer as above but with sucrose omitted. After an 
additional 10 minute period on ice, to allow for the osmotic 
release of proteins, cells were re-pelleted by centrifugation 
(12,000 xg, 2 minutes) and the supernatant ("shockate" ) examined 
for its protein content. Wild-type E. coli thioredoxin and human 
10 thioredoxin were quantitatively released, giving "shockate" 
preparations which were >80% pure thioredoxin. More 
significantly >80% of the thioredoxin-MIP3* and >50% of the 
thioredoxin-IL-11 fusion proteins were released by this osmotic 
treatment . 

15 A simple freeze/thaw procedure produced similar results, 

releasing thioredoxin fusion proteins selectively, while leaving 
most of the other cellular proteins inside the cell. A typical 
freeze/thaw procedure entails resuspending cells at 2 A 550 /ml in 
20 mM Tris-Cl pH 8.0/2.5 mM EDTA and quickly freezing the 

2 0 suspension in dry ice or liquid nitrogen. The frozen suspension 
is then allowed to slowly thaw before spinning out the cells 
(12,000 xg, 2 minutes) and examining the supernatant for protein. 

Although the resultant "shockate" may require additional 
purification, the initial "shockate" is characterized by the 

25 absence of nucleic acid contaminants. Compared to an initial 

lysate, the purity of the "shockate" is significantly better, and 
does not require the difficult removal of DNA from bacterial 
lysates . 

Thus, this release step can be substituted for the lysis 
30 step of Example 2. The supernatant obtained after centrifugation 
is then further purified in the manner disclosed in that Example. 

Numerous modifications and variations of the present 
invention are included in the above-identified specification and 
are expected to be obvious to one of skill in the art. Such 
35 modifications and alterations to the compositions and processes 
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of the present invention are believed to be encompassed in the 
scope of the claims appended hereto. 
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WHAT IS CLAIMED IS: 

1. A DNA sequence encoding a fusion protein, said sequence 
comprising DNA encoding a thioredoxin-like protein fused to a DNA 
sequence encoding a selected heterologous protein. 

2. A DNA sequence of claim 1 wherein said DNA encoding said 
thioredoxin-like protein comprises the amino terminus of said 
fusion protein. 

3 . A DNA sequence of claim 1 wherein said DNA encoding said 
thioredoxin-like protein comprises the carboxyl terminus of said 
fusion protein. 

4. A DNA sequence of claim 1, 2 or 3 wherein said DNA encoding 
said thioredoxin-like protein is selected from the group 
consisting of E. colj. thioredoxin and human thioredoxin. 

5. A DNA sequence of claim 1, 2 or 3 wherein said DNA encoding 
said selected protein is selected from the group consisting of 
IL-11# IL-6, Macrophage Inhibitory Protein la and Bone 
Morphogenic Protein 2. 

6. A DNA sequence of claim 1, 2 or 3 additionally comprising a 
linker DNA sequence fused between said DNA encoding said 
thioredoxin-like protein and said DNA encoding said selected 
heterologous protein. 

7. A plasmid DNA molecule comprising a DNA sequence of claims 
1-6, said sequence being under the control of a suitable 
expression control sequence capable of directing the expression 
of a fusion protein in a selected host cell. 


8 - An E. coli host cell transformed with, or having integrated 
into the genome thereof, a plasmid of claim 7. 
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9. A method of making a selected heterologous protein 
comprising 

(a) culturing in a culture medium under suitable conditions 
a host cell of claim 8; 

(b) recovering the fusion protein produced thereby from 
said culture medium; 

(c) cleaving said selected heterologous protein from said 
fusion protein and 

(d) isolating said selected heterologous protein. 

10. IL-11 protein produced by the method of claim 9. 

11. Use of thioredoxin in the method of claim 9. 
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FIGURE 1A 

pAI/fcrxA/EK/ILllA Pro -5 81 


6ACGAAAGGG 

CCTCGTGATA 

. CGCCTATTTT 

' TATAGGTTAA 

40 

T6TCATGATA AXAATGCTTT 

CTTAGACGTC 

! AGGTGGCACT 

80 


axv XuUjUuVj 

AACCCCTATT 

TGTTTATTTT 

120 

TCTAAAT A 


TATCCGCTCA 

TGAGACAATA 

160 

ACCCTGATAA 


AATATTGAAA 

AAGGAAGAGT 

200 

A 1 T I GAG ^ ^A ^ ^ r ^^ , 


TGTCGCCCTT 

ATTCCCTTTT 

240 

TTGCGGCATT 

X X wV*^* X X X 

GTTTTTGCTC 

ACCCAGAAAC 

280 

GCTGGTGAAA 

ulhAiiaualu 

CTGAAGATCA 

GTTGGGTG CA 

320 

CGAGTGGG'T'P 


GGATCTCAAC 

AGCGGTAAGA 

360 

TCCTTGAGJVG 

X X X X V*l9W»l*V*l* 

GAAGAACGTT 

TTCCAATGAT 

400 

GAGCACTTTT 

AAAGTTOTGC 

X*X»XX\J X X V* X ww 

TATGTGGCGC 

GGTATTATCC 

440 

CGTATTGACG 

CCGGGTAAGA 

GCAACTCGGT 

CGCCGCATAC 

480 

ACTATTCTPA 

GAATGAPTTG 
Vaiuilunwl X V7 

GTTGAGTACT 

CACCAGTCAC 

520 

AGAAAAGCAT 

CTTACGGATYi 

WX XAwWUlXw 

GCATGACAGT 

AAGAGAATTA 

560 

TGCAGTGCTG 

CCATAACCAT 

GAGTGATAAC 

ACTGCGGCCA 

600 

ACTTACTTCT 

GACAACGATC 

GGAGGACCGA 

AGGAGCTAAC 

640 

CGcari"i"i"rrG 

CACAACATGG 

GGGATCATGT 

AACTCGCCTT 

680 

GATCGTTGGG 

AACCGGAGCT 

GAATGAAGCC 

ATACCAAACG 

720 

ACGAGCGTGA 

CACCACGATG 

CCTGTAGCAA 

TGGCAACAAC 

760 

GTTGCGCAAA 

CTATTAACTG 

GCGAACTACT 

TACTCTAGCT 

800 

TCCCGGCAAC 

AATTAATAGA 

CTGGATGGAG 

GCGGATAAAG 

840 

TTGCAGGACC 

ACTTCTGCGC 

TCGGCCCTTC 

CGGCTGGCTG 

880 

GTTTATTGCT 

GATAAATCTG 

GAGCCGGTGA 

GCGTGGGTCT 

920 

CGCGGTATCA 

TTGCAGCACT 

GGGGCCAGAT 

GGTAAGCCCT 

960 

CCCGTATCGT 

AGTTATCTAC 

ACGACGGGGA 

GTCAGGCAAC 

1000 

TATGGATGAA 

CGAAATAGAC 

AGATCGCTGA 

GATAGGTGCC 

1040 
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FIGURE 1B 


TCACTGATTA A6CATTGGTA ACT6TCAGAC CAAGTTTACT 1080 

CATATATACT TTA6ATT6AT TTAAAACTTC ATTTTTAATT 1120 

TAAAAGGATC TA66T6AA6A TCCTTTTTGA TAATCTCAT6 1160 

ACCAAAATCC CTTAAC6T6A 6TTTTC6TTC CACT6A6C6T 1200 

CAGACCCCGT A6AAAAGATC AAAGGATCTT CTTGAGATCC 1240 

TTTTTTTCTG CGCGTAATCT GCTGCTTGCA AACAAAAAAA 128 O 

CCACCGCTAC CAGCGGTGGT TTGTTTGCCG GATCAAGAGC 1320 

TACCAACTCT TTTTCCGAAG GTAACTGGCT TCAGCAGAGC 1360 

GCAGATACCA AATACTGTCC TTCTAGTGTA GCCGTAGTTA 1400 

GGCCACCACT TCAAGAACTC TGTAGCACCG CCTACATACC 1440 

TCGCTCTGCT AATCCTGTTA CCAGTGGCTG CTGCCAGTGG 1480 

CGATAAGTCG TGTCTTACCG GGTTGGACTC AAGACGATAG 1520 

TTACCGGATA AGGCGCAGCG GTCGGGCTGA ACGGGGGGTT 1560 

CGTGCACACA GCCCAGCTTG GAGCGAACGA CCTACACCGA 1600 

ACTGAGATAC CTACAGCGTG AGCATTGAGA AAGCGCCACG 1640 

CTTCCCGAAG GGAGAAAGGC GGACAGGTAT CCGGTAAGCG 1680 

GCAGGGTCGG AACAGGAGAG CGCACGAGGG AGCTTCCAGG 1720 

GGGAAACGCC TGGTATCTTT ATAGTCCTGT CGGGTTTCGC 1760 

CACCTCTGAC TTGAGCGTCG ATTTTTGTGA TGCTCGTCAG 1800 

GGGGGCGGAG CCTATGGAAA AACGCCAGCA ACGCGGCCTT 1840 

TTTACGGTTC CTGGCCTTTT GCTGGCCTTT TGCTCACATG 1880 

TTCTTTCCTG CGTTATCCCC TGATTCTGTG GATAACCGTA 1920 

TTACCGCCTT TGAGTGAGCT GATACCGCTC GCCGCAGCCG 1960 

AACGACCGAG CGCAGCGAGT CAGTGAGCGA GGAAGCGGAA 2000 

GAGCGCCCAA TACGCAAACC GCCTCTCCCC GCGCGTTGGC 2040 

CGATTCATTA ATGCAGAATT GATCTCTCAC CTACCAAACA 2080 

ATGCCCCCCT GCAAAAAATA AATTCATATA AAAAACATAC 2120 
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FIGURE 1C 

AGATAACCAT CTGCGGTGAT AAATTATCTC TGGCGGTGTT 2160 

GACATAAATA CCACTGGCGG TGATACTGAG CACATCAGCA 2200 

GGACGCACTG ACCACCATGA ATTCAAGAAG GAGATATACA 2240 

T ATG AGC GAT AAA ATT ATT CAC CTG ACT GAC GAC 2274 
Met Ser Asp Lys lie lie His Leu Thr Asp Asd 
1 5 10 

AGT TTT GAC ACG GAT GTA CTC AAA GCG GAC GGG 2307 
Ser Phe Asp Thr Asp Val Leu Lys Ala Asp Gly 
15 20 

GCG ATC CTC GTC GAT TTC TGG GCA GAG TGG TGC 2340 
Ala lie Leu Val Asp Phe Trp Ala Glu Trp Cys 
25 30 

GGT CCG TGC AAA ATG ATC GCC CCG ATT CTG GAT 2373 
Gly Pro Cys Lys Met lie Ala Pro lie Leu Asp 
35 40 

GAA ATC GCT GAC GAA TAT CAG GGC AAA CTG ACC 2406 
G1 u He Ala Asp Glu Tyr Gin Gly Lys Leu Thr 
45 50 55 

GTT GCA AAA CTG AAC ATC GAT CAA AAC CCT GGC 2439 
Val Ala Lys Leu Asn lie Asp Gin Asn Pro Gly 

60 65 

ACT GCG CCG AAA TAT GGC ATC CGT GGT ATC CCG 2472 
Thr Ala Pro Lys Tyr Gly lie Arg Gly lie Pro 
70 75 

ACT CTG CTG CTG TTC AAA AAC GGT GAA GTG GCG 2505 
Thr Leu Leu Leu Phe Lys Asn Gly Glu Val Ala 
80 85 

GCA ACC AAA GTG GGT GCA CTG TCT AAA GGT CAG 2538 
Ala Thr Lys Val Gly Ala Leu Ser Lys Gly Gin 
90 95 

TTG AAA GAG TTC CTC GAC GCT AAC CTG GCC GGT 2571 
Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Glv 
100 105 nB 

TCT GGT TCT GGT GAT GAC GAT GAC AAA GGT CCA 2604 
Ser Gly Ser Gly Asp Asp Asp Asp Lys Gly Pro 

115 120 
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FIGURE 1D 


CCA. CCA GGT CCA CCT CGA GTT TCC CCA GAC CCT 2637 
Pro Pro Gly Pro Pro Arg Val Ser Pro Asp Pro 
125 130 

CGG GCC GAG CTG GAC AGC ACC GTG CTC CTG ACC 2670 
Arg Ala Glu Leu Asp Ser Thr Val Leu Leu Thr 
135 140 

CGC TCT CTC CTG GCG GAC ACG CGG CAG CTG GCT 2703 
Arg Ser Leu Leu Ala Asp Thr Arg Gin Leu Ala 
145 150 


2736 


2769 


GCA CAG CTG AGG GAC AAA TTC CCA GCT GAC GGG 
Ala Gin Leu Arg Asp Lys Phe Pro Ala Asp Glv 
155 160 * i 6 | 

GAC CAC AAC CTG GAT TCC CTG CCC ACC CTG GCC 
Asp His Asn Leu Asp Ser Leu Pro Thr Leu Ala 

170 175 

ATG AGT GCG GGG GCA CTG GGA GCT CTA CAG CTC 2802 
Met ser Ala Gly Ala Leu Gly Ala Leu Gin Leu 
180 185 

CCA GGT GTG CTG ACA AGG CTG CGA GCG GAC CTA 
Pro Gly Val Leu Thr Arg Leu Arg Ala Asp Leu 
190 195 

CTG TCC TAC CTG CGG CAC GTG CAG TGG CTG CGC 
Leu ser Tyr Leu Arg His Val Gin Trp Leu Ara 
200 205 

CGG GCA GGT GGC TCT TCC CTG AAG ACC CTG GAG 
Arg Ala Gly Gly Ser Ser Leu Lys Thr Leu Glu 
210 215 220 

CCC GAG CTG GGC ACC CTG CAG GCC CGA CTG GAC 
Pro Glu Leu Gly Thr Leu Gin Ala Arg Leu Asp 

225 230 

CGG CTG CTG CGC CGG CTG CAG CTC CTG ATG TCC 
Arg Leu Leu Arg Arg Leu Gin Leu Leu Met ser 
235 240 

CGC CTG GCC CTG CCC CAG CCA CCC CCG GAC CCG 
Arg Leu Ala Leu Pro Gin Pro Pro Pro Asp Pro 
245 250 

CCG GCG CCC CCG CTG GCG CCC CCC TCC TCA GCC 
Pro Ala Pro Pro Leu Ala Pro Pro Ser Ser Ala 
255 260 


2835 


2868 


2901 


2934 


2967 


3000 


3033 
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FIGURE 1E 

TGG GGG GGC ATC AGG GCC GCC CAC GCC ATC CTG 3066 
Trp Gly Gly Tie Arg Ala Ala His Ala lie Leu 
265 270 275 

GGG GGG CTG CAC CTG ACA CTT GAC TGG GCC GTG 3099 
Gly GLy Leu His Leu Thr Leu Asp Trp Ala Val 

280 285 

AGG GGA CTG CTG CTG CTG AAG ACT CGG CTG TGA 3132 
Arg Gly Leu Leu Leu Leu Lys Thr Arg Leu 
290 295 

AAGCTTATCG ATACCGTCGA CCTGCAGTAA TCGTACAGGG 3172 

TAGTACAAAT AAAAAAGGCA CGTCAGATGA CGTGCCTTTT 3212 

TTCTTGTGAG CAGTAAGCTT GGCACTGGCC GTCGTTTTAC 3252 

AACGTCGTGA CTGGGAAAAC CCTGGCGTTA CCCAACTTAA 3292 

TCGCCTTGCA GCACATCCCC CTTTCGCCAG CTGGCGTAAT 3332 

AGCGAAGAGG CCCGCACCGA TCGCCCTTCC CAACAGTTGC 3372 

GCAGCCTGAA TGGCGAATGG CGCCTGATGC GGTATTTTCT 3412 

CCTTACGCAT CTGTGCGGTA TTTCACACCG CATATATGGT 3452 

GCACTCTCAG TACAATCTGC TCTGATGCCG CATAGTTAAG 3492 

CCAGCCCCGA CACCCGCCAA CACCCGCTGA CGCGCCCTGA 3532 

CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC AGACAAGCTG 3572 

TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC 3612 

GTCATCACCG AAACGCGCGA 3632 
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FIGURE 2 

GCA CCA CTT GCT GCT GAC ACG CCG ACC GCC TGC TGC 36 
Ala Pro Leu Ala Ala Asp Ttir Pro Ttir Ala Cys Cys 
15 10 


TTC AGC TAC ACC TCC CGA CAG ATT CCA CAG AAT TTC 72 
Phe Ser Tyr Thr Ser Arg Gin lie Pro Gin Asn Phe 
15 20 

ATA GCT GAC TAC TTT GAG ACG AGC AGC CAG TGC TCC 109 
Xle Ala Asp Tyr Phe Glu Thr Ser Ser Gin Cys Ser 
25 30 35 

AAG CCC AGT GTC ATC TTC CTA ACC AAG AGA GGC CGG • 145 
Lys Pro Ser Val He Phe Leu Thr Lys Arg Gly Arg 
40 45 

CAG GTC TGT GCT GAC CCC AGT GAG GAG TGG GTC CAG 181 
Gin Val Cys Ala Asp Pro Ser Glu Glu Trp Val Gin 
50 55 60 

AAA TAC GTC AGT GAC CTG GAG CTG AGT GCC TAA 214 
Lys Thr Val Ser Asp Leu Glu Leu Ser Ala 

65 70 


WO 92/13955 


PCT/US92/00944 


7/12 
FIGURE 3 

BMP- 2 

CAA GCT AAA CAT AAA CAA CGT AAA CGT CTG AAA TCT 36 
Gin Ala Lys His Lys Gin Arg Lys Arg Leu Lys Ser 
1 5 10 

AGC TGT AAG AGA CAC CCT TTG TAC GTG GAC TTC AGT 72 
Ser Cys Lys Arg His Pro Leu Tyr Val Asp Phe Ser 
15 20 

GAC GTG GGG TGG AAT GAC TGG ATT GTG GCT CCC CCG 109 
Asp Val Gly Trp Asn Asp Trp He Val Ala Pro Pro 
25 30 35 

GGG TAT CAC GCC TTT TAC TGC CAC GGA GAA TGC CCT 145 
Gly Tyr His Ala Phe Tyr Cys His Gly Glu Cys Pro 
40 45 

TTT CCT CTG GCT GAT CAT CTG AAC TCC ACT AAT CAT 181 
Phe Pro Leu Ala Asp His Leu Asn Ser Thr Asn His 
50 55 60 

GCC ATT GTT CAG ACG TTG GTC AAC TCT GTT AAC TCT 217 
Ala He Val Gin Thr Leu Val Asn Ser Val Asn Ser 

65 70 

AAG ATT CCT AAG GCA TGC TGT GTC CCG ACA GAA CTC 253 
Lys lie Pro Lys Ala Cys Cys Val Pro Thr Glu Leu 
75 80 

AGT GCT ATC TCG ATG CTG TAC CTT GAC GAG AAT GAA 289 
Ser Ala He Ser Met Leu Tyr Leu Asp Glu Asn Glu 
85 90 95 

AAG GTT GTA TTA AAG AAC TAT CAG GAC ATG GTT GTG 325 
Lys Val Val Leu Lys Asn Tyr Gin Asp Met Val Val 
100 105 

GAG GGT TGT GGG TGT CGC TAG 346 
Glu Gly Cys Gly Cys Arg 
110 


WO 92/13955 


PCT/US92/00944 


8/12 
FIGURE 4 


INSERTION OF AN ENTERO KINASE SITE INTO 

THE ACTIVE-SITE LOOP OF E.COLI THIOREDOXIN (trxA) 


RsrII 
I 

. . . .GA6T66T6C6GTCC6T6CAAAAT6. 

trxA active 

site loop CTCACCACGCCAGGCACGTTTTAC . 

E W C 6 P C K M . 

31 38 


.... GAGTGGTGCG GTCCGTGCAAAATG .... 

RsrII cut 

.... CTCACCACGCCAG GCACGTTTTAC . 

EHCG PCKM 

31 38 


Enterokinase site 
(13 residues) 

gtcactccGACTACAAAGACGACXSACGACAAAgcttctg 

tgaggCTGATGTTTCTGCTGCTGCTGTTTcgaagaccag 

H S D Y K D D D D K A S G... 

/\ 

A, 

cleavage site 


WO 92/13955 ^ ^ PCT/US92/00944 

9/12 
FIGURE 5 

RANDOM PEPTIDE INSERTIONS INTO THE ACTIVE-SITE 
LOOP OF E.COLI THIOREDOXIN (trxA) 


RsrII 
I 

. . . .GAGTGGTGCGGTCCGTGCAAAATG. 

trxA active 

Site loop CTCACCACGCCAGGCACGTTTTAC . 

E W C G P C K M . 

31 38 


.... GAGTGGTGCG GTCCGTGCAAAATG . 
RsrII cut — 

. . . . CTCACCACGCCAG GCACGTTTTAC. 


EWCG P CKM 

31 38 


(Avail) Avail 
5' | | 3 » 
GACTGACTGGTCCG. . . (N, fi ) . . . GGTCCTCAGTCAGTCAG 
Oligos 

CCAGGAGTCAGTCAGTC 
3* 5» 


random GTCCG. . . (N, fi ) . . .G 
duplex --- 

GC ( N 36) • • -CCAG 

insertion into trxA active site loop 

.... GAGTGGTGCGGTCCG (N 3 6 ) . - . GGTCCGTGCAAAATG 

. . . .CTCACCACGCCAGGC. . . (N 36 ) . . . CCAGGCACGTTTTAC . . . 

E W C G P . .(X 12 ). . G P C K M ... 

31 38 
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FIGURE 6 

IL-6 

5 10 
ATG GCT CCA GTA CCT CCA GGT 6AA GAT TCT AAA GAT GTA 39 
Met Ala Pro Val Pro Pro Gly Glu Asp Ser Lys Asp Val 

15 20 25 

GCC GCC CCA CAC AGA CAG CCA CTC ACC TCT TCA GAA CGA 78 
Ala Ala Pro His Arg Gin Pro Leu Thr Ser Ser Glu Arg 

30 35 
ATT GAC AAA CAA ATT CGG TAC ATC CTC GAC GGC ATC TCA 117 
lie Asp Lys Gin lie Arg Tyr lie Leu Asp Gly lie Ser 

40 45 50 

GCC CTG AGA AAG GAG ACA TGT AAC AAG AGT AAC ATG TGT 156 
Ala Leu Arg Lys Glu Thr Cys Asn Lys Ser Asn Met Cys 

55 60 
GAA AGC AGC AAA GAG GCA CTG GCA GAA AAC AAC CTG AAC 195 
Glu Ser Ser Lys Glu Ala Leu Ala Glu Asn Asn Leu Asn 

65 70 75 

CTT CCA AAG ATG GCT GAA AAA GAT GGA TGC TTC CAA TCT 234 
Leu Pro Lys Met Ala Glu Lys Asp Gly Cys Phe Gin Ser 

80 85 90 

GGA TTC AAT GAG GAG ACT TGC CTG GTG AAA ATC ATC ACT 273 
Gly Phe Asn Glu Glu Thr Cys Leu Val Lys lie lie Thr 

95 100 

GGT CTT TTG GAG TTT GAG GTA TAC CTA GAG TAC CTC CAG 312 
Gly Leu Leu Glu Phe Glu Val Tyr Leu Glu Thr Leu Gin 

105 110 115 

AAC AGA TTT GAG AGT AGT GAG GAA CAA GCC AGA GCT GTG 351 
Asn Arg Phe Glu Ser Ser Glu Glu Gin Ala Arg Ala Val 


120 125 
CAG ATG AGT ACA AAA GTC CTG ATC CAG TTC CTG CAG AAA 
Gin Met Ser Thr Lys Val Leu He Gin Phe Leu Gin Lys 


390 
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FIGURE 6 (continued) 

130 140 150 

AAG GCA AAG AAT CTA GAT GCA ATA ACC ACC CCT GAC CCA 429 
Lys Ala Lys Asn Leu Asp Ala lie Thr Thr Pro Asp Pro 

155 160 
ACC ACA AAT GCC AGC CTG CTG ACG AAG CTG CAG GCA CAG 468 
Thr Thr Asn Ala Ser Leu Leu Tbr Lys Leu Gin Ala Gin 

170 175 180 

AAC CAG TGG CTG CAG GAC ATG ACA ACT CAT CTC ATT CTG 507 
Asn Gin Trp Leu Gin Asp Met Thr Thr His Leu lie Leu 

185 190 
CGC AGC TTT AAG GAG TTC CTG CAG TCC AGC CTG AGG GCT 546 
Arg Ser Phe Lys Glu Phe Leu Gin Ser Ser Leu Arg Ala 

195 

CTT CGG CAA ATG TAG 561 
Leu Arg Gin Met * 
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FIGURE 7 

1 GAAGAAGTTT CTGAATATTG TAGCCACATG ATTGGGAGTG GACACCTGCA 
51 GTCTCTGCAG CGGCTGATTG ACAGTCAGAT GGAGACCTCG TGCCAAATTA 
101 CATTTGAGTT TGTAGACCAG GAACAGTTGA AAGATCCAGT GTGCTACCTT 
151 AAGAAGGGAT TTCTCCTGGT ACAAGACATA ATGGAGGACA CCATGCGCTT 
201 GAGAGATAAC ACCCCCAATG CCATCGCCAT TGTGCAGCTG CAGGAACTCT 
251 CTTTGAGGCT GAAGAGCTGC TTCACCAAGG ATTATGAAGA GCATGACAAG 
301 GCCTGCGTCC GAACTTTCTA TGAGACACCT CTCCAGTTGC TGGAGAAGGT 
351 CAAGAATGTC TTTAATGAAA CAAAGAATCT CCTTGACAAG GACTGGAATA 
401 TTTTGAGCAA GAACTGCAAC AACAGCTTTG CTGAATGCTC CAGCCAAGAT 
451 GTGGTGACCA AGCCTGATTG CAACTGCCTG TACCCCAAAG CCATCCCTAG 
501 CAGTGACCCG GCCTCTGTCT CCCCTCATCA GCCCCTCGCC CCCTCCATGG 
551 CCCCTGTGGC TGGCTTGACC TGGGAGGACT CTGAGGGAAC TGAGGGCAGC 
601 TCCCTCTTGC CTGGTGAGCA GCCCCTGCAC AGAGTGGATC CAGGCAGTGC 
651 CAAGCAGCGG CCACCCAGG 
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